MEDOC: a Python wrapper to load MEDLINE into a local MySQL database

نویسندگان

  • Emeric Dynomant
  • Mathilde Gorieu
  • Helene Perrin
  • Marion Denorme
  • Fabien Pichon
  • Arnaud Desfeux
چکیده

Since the MEDLINE database was released, the number of documents indexed by this entity has risen every year. Several tools have been developed by the National Institutes of Health (NIH) to query this corpus of scientific publications. However, in terms of advances in big data, text-mining and data science, an option to build a local relational database containing all metadata available on MEDLINE would be truly useful to optimally exploit these resources. MEDOC (MEdline DOwnloading Contrivance) is a Python program designed to download data on an FTP and to load all extracted information into a local MySQL database. It took MEDOC 4 days and 17 hours to load the 26 million documents available on this server onto a standard computer. This indexed relational database allows the user to build complex and rapid queries. All fields can thus be searched for desired information, a task that is difficult to accomplish through the PubMed graphical interface. MEDOC is free and publicly available at https://github.com/MrMimic/MEDOC.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

CHASM and SNVBox: toolkit for detecting biologically important single nucleotide mutations in cancer

SUMMARY Thousands of cancer exomes are currently being sequenced, yielding millions of non-synonymous single nucleotide variants (SNVs) of possible relevance to disease etiology. Here, we provide a software toolkit to prioritize SNVs based on their predicted contribution to tumorigenesis. It includes a database of precomputed, predictive features covering all positions in the annotated human ex...

متن کامل

Free/Libre Open Source Software users’ local Face-to-Face Meetings in career-related activities

While prior studies on Free/Libre Open Source Software (FLOSS) users have mainly looked at FLOSS users in the context of FLOSS communities, little is known about whether and how FLOSS communities play a role in the context of its users’ careers. This study takes an inductive approach to investigate the phenomena that are emergent and poorly understood. As promising themes emerge from a pilot st...

متن کامل

A production system for massive data processing in ILCSoft

This memo presents a new production system for massive data processing on the Grid and other large computing facilities. The system is primarily targeted at being used with programs from the ILCSoft software framework developed partly within the EUDET project. The new system is written in Python and centered around a MySQL database. It provides the functionality that is needed for the submissio...

متن کامل

Design & Implementation of Jmeter Framework for Performance Comparison in Ruby, PHP, & Python Web Applications

The absence of thebroadperformance comparison for websites in multiple frameworks has been a major deterrentin this area. There is a need of preemptive charting which then can be referred to before selecting a framework for web development. The use of best practices is crucial and worthwhile, albeitthe performance of an application can’t be emasculated. While the former, in general,employsDRY (...

متن کامل

ND: A Comprehensive Network Administration and Analysis Tool

ND is a software tool developed by the Computing and Information Services Network Group at Texas A&M University (TAMU) to aid in the engineering and operation of the campus network. This tool was developed in response to the tremendous growth of the TAMU campus network over the last ten years. ND is designed to provide high-level application functionality while retaining the power and flexibili...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1710.06590  شماره 

صفحات  -

تاریخ انتشار 2017